Goto

Collaborating Authors

 experimental result show


07211688a0869d995947a8fb11b215d6-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the anonymous reviewers for their constructive feedback. We address each comment as follows. R1-Q2:Just using the predicted mask to concat. R1-Q3:Refine the predicted mask with CRF . SEAM show that CRF ( vs CONT A) is only effective in the first round, i .


Fast and accurate randomized algorithms for low-rank tensor decompositions

Neural Information Processing Systems

Low-rank Tucker and CP tensor decompositions are powerful tools in data analytics. The widely used alternating least squares (ALS) method, which solves a sequence of over-determined least squares subproblems, is costly for large and sparse tensors. We propose a fast and accurate sketched ALS algorithm for Tucker decomposition, which solves a sequence of sketched rank-constrained linear least squares subproblems. Theoretical sketch size upper bounds are provided to achieve $O(\epsilon)$ relative error for each subproblem with two sketching techniques, TensorSketch and leverage score sampling. Experimental results show that this new ALS algorithm, combined with a new initialization scheme based on the randomized range finder, yields decomposition accuracy comparable to the standard higher-order orthogonal iteration (HOOI) algorithm. The new algorithm achieves up to $22.0\%$ relative decomposition residual improvement compared to the state-of-the-art sketched randomized algorithm for Tucker decomposition of various synthetic and real datasets. This Tucker-ALS algorithm is further used to accelerate CP decomposition, by using randomized Tucker compression followed by CP decomposition of the Tucker core tensor. Experimental results show that this algorithm not only converges faster, but also yields more accurate CP decompositions.



MSTIM: A MindSpore-Based Model for Traffic Flow Prediction

Qin, Weiqi, Liu, Yuxin, Wu, Dongze, Qin, Zhenkai, Luo, Qining

arXiv.org Artificial Intelligence

Aiming at the problems of low accuracy and large error fluctuation of traditional traffic flow predictionmodels when dealing with multi-scale temporal features and dynamic change patterns. this paperproposes a multi-scale time series information modelling model MSTIM based on the Mindspore framework, which integrates long and short-term memory networks (LSTMs), convolutional neural networks (CNN), and the attention mechanism to improve the modelling accuracy and stability. The Metropolitan Interstate Traffic Volume (MITV) dataset was used for the experiments and compared and analysed with typical LSTM-attention models, CNN-attention models and LSTM-CNN models. The experimental results show that the MSTIM model achieves better results in the metrics of Mean Absolute Error (MAE), Mean Square Error (MSE), and Root Mean Square Error (RMSE), which significantly improves the accuracy and stability of the traffic volume prediction.


How to optimize K-means?

Li, Qi

arXiv.org Artificial Intelligence

Center-based clustering algorithms (e.g., K-means) are popular for clustering tasks, but they usually struggle to achieve high accuracy on complex datasets. We believe the main reason is that traditional center-based clustering algorithms identify only one clustering center in each cluster. Once the distribution of the dataset is complex, a single clustering center cannot strongly represent distant objects within the cluster. How to optimize the existing center-based clustering algorithms will be valuable research. In this paper, we propose a general optimization method called ECAC, and it can optimize different center-based clustering algorithms. ECAC is independent of the clustering principle and is embedded as a component between the center process and the category assignment process of center-based clustering algorithms. Specifically, ECAC identifies several extended-centers for each clustering center. The extended-centers will act as relays to expand the representative capability of the clustering center in the complex cluster, thus improving the accuracy of center-based clustering algorithms. We conducted numerous experiments to verify the robustness and effectiveness of ECAC. ECAC is robust to diverse datasets and diverse clustering centers. After ECAC optimization, the accuracy (NMI as well as RI) of center-based clustering algorithms improves by an average of 33.4% and 64.1%, respectively, and even K-means accurately identifies complex-shaped clusters.


Fast and accurate randomized algorithms for low-rank tensor decompositions

Neural Information Processing Systems

Low-rank Tucker and CP tensor decompositions are powerful tools in data analytics. The widely used alternating least squares (ALS) method, which solves a sequence of over-determined least squares subproblems, is costly for large and sparse tensors. We propose a fast and accurate sketched ALS algorithm for Tucker decomposition, which solves a sequence of sketched rank-constrained linear least squares subproblems. Theoretical sketch size upper bounds are provided to achieve O(\epsilon) relative error for each subproblem with two sketching techniques, TensorSketch and leverage score sampling. Experimental results show that this new ALS algorithm, combined with a new initialization scheme based on the randomized range finder, yields decomposition accuracy comparable to the standard higher-order orthogonal iteration (HOOI) algorithm.


Meta-Learning Loss Functions for Deep Neural Networks

Raymond, Christian

arXiv.org Artificial Intelligence

Humans can often quickly and efficiently solve complex new learning tasks given only a small set of examples. In contrast, modern artificially intelligent systems often require thousands or millions of observations in order to solve even the most basic tasks. Meta-learning aims to resolve this issue by leveraging past experiences from similar learning tasks to embed the appropriate inductive biases into the learning system. Historically methods for meta-learning components such as optimizers, parameter initializations, and more have led to significant performance increases. This thesis aims to explore the concept of meta-learning to improve performance, through the often-overlooked component of the loss function. The loss function is a vital component of a learning system, as it represents the primary learning objective, where success is determined and quantified by the system's ability to optimize for that objective successfully.


LB-KBQA: Large-language-model and BERT based Knowledge-Based Question and Answering System

Zhao, Yan, Li, Zhongyun, Pan, Yushan, Wang, Jiaxing, Wang, Yihong

arXiv.org Artificial Intelligence

Generative Artificial Intelligence (AI), because of its emergent abilities, has empowered various fields, one typical of which is large language models (LLMs). One of the typical application fields of Generative AI is large language models (LLMs), and the natural language understanding capability of LLM is dramatically improved when compared with conventional AI-based methods. The natural language understanding capability has always been a barrier to the intent recognition performance of the Knowledge-Based-Question-and-Answer (KBQA) system, which arises from linguistic diversity and the newly appeared intent. Conventional AI-based methods for intent recognition can be divided into semantic parsing-based and model-based approaches. However, both of the methods suffer from limited resources in intent recognition. To address this issue, we propose a novel KBQA system based on a Large Language Model(LLM) and BERT (LB-KBQA). With the help of generative AI, our proposed method could detect newly appeared intent and acquire new knowledge. In experiments on financial domain question answering, our model has demonstrated superior effectiveness.


Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS

Liang, Ziqi

arXiv.org Artificial Intelligence

Recurrent neural networks (RNNs) have become a standard modeling technique for sequential data and are used in novel text-tospeech models. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techniques can significantly reduce the training time of a text-to-speech model while guaranteeing a certain performance due to its high parallelism. We propose a novel text-to-speech system based on deep convolutional neural networks that does not employ any RNN components and is a two-stage training endto-end TTS model. Meanwhile, we improve the robustness of our model by a series of data enhancement methods, such as time warping, frequency masking and time masking, for the low resource problem. We propose a novel text-to-speech system based on deep convolutional neural networks, which does not employ any RNN components (recurrent units) and is a two-stage training end-to-end TTS model. Also, to address the low resource problem of lacking labeled data, we improve the robustness of our model by a series of data enhancement methods such as time warping, frequency masking and time masking. The final experimental results show that a TTS model using only CNN components can reduce the training time while ensuring the quality and naturalness of the synthesized speech compared to using mainstream TTS models, such as Tacotron2 and the vocoder Hifigan.


A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

Zhang, Chenshuang, Zhang, Chaoning, Zheng, Sheng, Zhang, Mengchun, Qamar, Maryam, Bae, Sung-Ho, Kweon, In So

arXiv.org Artificial Intelligence

Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active tasks: text to speech and speech enhancement. This work conducts a survey on audio diffusion model, which is complementary to existing surveys that either lack the recent progress of diffusion-based speech synthesis or highlight an overall picture of applying diffusion model in multiple fields. Specifically, this work first briefly introduces the background of audio and diffusion model. As for the text-to-speech task, we divide the methods into three categories based on the stage where diffusion model is adopted: acoustic model, vocoder and end-to-end framework. Moreover, we categorize various speech enhancement tasks by either certain signals are removed or added into the input speech. Comparisons of experimental results and discussions are also covered in this survey.